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, In this paper, we propose a thermodynamic mechanism for the formation of transcriptional foci 

. via the joint agglomeration of DNA-looping proteins and protein-binding domains on DNA: The 

^SJ ' competition between the gain in protein-DNA binding free energy and the entropy loss due to DNA 

looping is argued to result in an effective attraction between loops. A mean-field approximation can 
O ■ be described analytically via a mapping to a restricted random-graph ensemble having local degree 

' constraints and global constraints on the number of connected components. It shows the emergence 

_ I of protein clusters containing a finite fraction of all looping proteins. If the entropy loss due to a 

' , single DNA loop is high enough, this transition is found to be of first order. 
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I. INTRODUCTION 



Understanding the spatial organization of DNA in the cell / the cellular nucleus and its relation to transcription is 
one of the big challenges in cell biology [H, H, d, [3) 11] ■ In this context, the experimental observation of transcription foci 
• is of great interest: The transcriptional activity Js not evenly distributed inside the cell, but it is concentrated in focal 
qh' points around so-called transcription factories [3]. These factories contain multiple copies of RNA polymerasis, tran- 
scription factors and parts of the machinery for post-transcriptional RNA modifications. In order to be transcribed, 
DNA has to loop back to these transcription factories, it is expected that one factory is surrounded by about 10-20 
^ , DNA loops. In this and related phenomenological pictures [3, Q the formation of transcription factories and DNA 
' looping are considered to be of fundamental importance for the large-scale spatial organization of the transcriptional 
QQ , activity. A sound theoretical understanding grounded on simple physical mechanisms is, however, missing. 

' The major reason for the increased efficiency of transcription by transcription factories is the following: A locally 
' increased concentration of transcription factors close to target genes enhances recognition of transcription factor 
binding sites, the volume a transcription factor has to search before finding a target gene is substantially decreased. 
In bacteria, local concentration effects are partially achieved by the co-localization of genes coding for a transcription 
OO , factor and their target genes along the one-dimensional chromosome itself 6]. By looping, target genes which are far 
' along the genome can be brought together in cellular / nuclear space. Very recently, it was shown experimantally 
^ I Q that compact DNA conformations actually enhance target localization compared to stretched conformations. 
• ""j ■ This observation supplies strong support for the importance of coupling transcriptional activity to the spatial DNA 
r> ' organization. 

^ . The formation of single DNA loops and its consequences for gene regulation have recently been in the center 
■ ■ ■ ' of interest of many bio-physical research works. These range from precise numerical descriptions of the looping 
properties of DNA resp. chromatin fibers Q up to the thermodynamic modeling of mechanisms for transcriptional 
gene-regulation. Both direct looping by bivalent transcription factors (as e.g. the lac repress or) [lol . and looping 
via attractive protein-protein interactions between DNA-bound proteins have been studied [H, ■ The latter process 
is important in particular in distal gene regulation in eukaryotic cells [T] . 

In this paper, we assume a more global point of view: May DNA loops and looping proteins agglomerate collectively 
to give rise to transcriptional foci? What are the thermodynamic ingredients leading to such an agglomeration? In 
this context, we model the DNA as a string containing many protein binding domains (BD), each one composed of K 
binding sites (BS). In this work we consider only bivalent DNA-binding proteins which are able to bind simultaneously 
to two different BDs, introducing thus a DNA loop Fig. [T] resumes the basic model ingredients. We find that this 
simple model leads to an effective attraction between DNA loops and thus to the formation of protein agglomerates. 

The role of multiple binding sites for a single loop has already been studied by Vilar et al. whereas multiple 

loops have been considered by Hanke and Metzler [3] but for BDs with only K = 1 BS. We will show that only the 
combination of both is able to introduce the desired emergence of protein agglomerates. This raises an interesting 
question: given a concentration of looping proteins and an entropy cost of bringing two binding domains in close 
vicinity, is it possible to get an agglomerate of binding domains? Besides being of interest to transcription factories 
the question is actually very general and interesting by itself when reformulated differently: If we consider each BD 
to be a monomer, then the problem is equivalent to understanding the effect of introducing L non nearest neighbour 
links between N monomers, with global constraints resulting from the entropy loss due to DNA looping, and local 
constraints due to the structure assumed for the BD. 

In the following Sec. [Ill we first discuss the basic mechanism for protein agglomeration resulting from the combi- 
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FIG. 1: Schematic representation of a single DNA loop with one looping protein: The looping protein binds to single binding 
sites in two binding domains (each binding domain has K binding sites), leading to a binding free-energy gain of /;,. A DNA 
loop leads to an entropy loss a. 

approach developed by Engel et al. 15]. In Sec. |Vl we discuss the results of our mean-field model in the context of 
factories and compare them with the known results for collapse of randomly linked polymers. 

II. THE BASIC MECHANISM 

As shown in Fig. (TJ there are two competing effects related to DNA looping: First, the binding of a linki ng p rotein 
introduces some free-energy difference —fh (for example in case of lac operon is of order 10-15 kcal/mol [3). The 
second contribution comes from the fact that each loop reduces the conformational entropy of the DNA, thus a link 
leads to a total free-energy difference of AF = — /t, + Ts, with T being the temperature and s being the entropy loss. 
In principle s depends on the length of the loop and on the DNA stiffness, cf. [lj|. For this qualitative argument we 
do not take care of this dependence and use the entropy loss of a typical-length loop. 

Now, as shown in Fig. [2l we introduce a second loop, and the total free-energy difference to the unlooped config- 
uration becomes AF = — 2/b -I- 2Ts. There are two possible cases for the relative positions of the two loops: First, 
the loops are distant, and the binding of another linker protein has to introduce a new loop. Second, loops share 
one BD. Then also the unconnected BDs of the two loops may be linked, cf. cases (c) and (d) in the figure. In this 
case, binding free energy is gained, but no new loop is introduced, i.e., no further entropy is lost. We thus have a free 
energy AF = — 3/b -I- 2Ts which is lower than the one achievable by distant loops. This mechanism introduces an 
effective attraction between binding domains of loops: A cluster of n loops might be connected by n{n-\- 1)/2 proteins, 
so the binding free-energy is growing quadratically with the entropy loss. Note that this picture is based on the simple 
observation of multiplicity of protein binding sites in a binding domain on DNA. 

III. A MEAN-FIELD DESCRIPTION VIA RANDOM PROTEIN-CONNECTION GRAPHS 

To gain a first understanding of the action of this effective attraction, we set up a mean-field model. The entropy 
loss s due to the introduction of a loop is assumed to be independent of the one-dimensional distance between two 
BDs measured along the DNA chain. Note that this approximation would be exact for monomers in a box. 

On this level, BDs can be seen as vertices of a protein- connection graph, and each bound protein between two such 
vertices forms an edge. We assume L proteins to be bound. The entropy loss due to this linking depends on the 
component structure of the graph: A connected component (CC) of n vertices contains n — 1 loops. Denoting the 
number of CCs of n vertices by C{n), and the total vertex number by TV, we find that the free-energy difference with 
respect to the loop-free system is 

N 
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FIG. 2: Basic agglomeration mechanism: (a) DNA is represented by a string, binding proteins by linkers, (b) Binding a protein 
to two BDs leads to a gain of binding free energy, but causes a loss in DNA conformational entropy, (c) The same happens, if 
a second loop is introduced, (d) Now, if one BD is common to the loops, a next protein may bind to the still unlinked BDs 
without major entropy losses. 



= — Lfi, ~ TCs + const. (1) 

with C = ^„ C{n) being the total number of CCs. This free energy has two competing negative contributions. The 
first term favors large L by binding more proteins, and its ground state would be the fully connected graph which 
has only one CC. The second contribution in ([1]) favors many components for positive s. Its ground state is thus the 
empty graph with each of the N isolated vertices as a CC. The global behavior of the model is given by the balance 
of these two terms, and can be characterized by the partition sum running over all graphs, 

eMLfb/T + Cs} . (2) 

graphs 

We note that this partition function describes a modified random-graph ensemble which depends only on the number 
of links and the number of CCs. In fact, in usual diluted random graphs [l^ each pair of vertices is connected with 
some probability < p < 1, and left unconnected with 1 — p. The probability of a specific graph with L edges is then 
proportional to [p/ (1 — p)]^, so it is exponential in the number of edges. If, further on, we reweight all graphs by some 
factor , we find that the graphs have a probability corresponding to Eq. ^ by identifying p/{l ~ p) := e^''^'^ and 
g := e*. Further more, the sum over all graphs is restricted by the connectivity constraint: At most K proteins can be 
bound to one BD, for d bound proteins the distinguishable nature of the BS inside the BD results in a combinatorial 
factor K\/{K — d)l. In the next section we give an analytical description of this problem for any K. 

The main question of this work, i.e. the question if an agglomerate exists or not, translates to the problem of graph 
percolation: Agglomeration is equivalent to the existence of an extensively large connected component in the graph, 
i.e. to the existence of a giant component. 
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IV. ANALYTICAL DESCRIPTION OF THE GRAPH ENSEMBLE 



Before coming to the full description of the problem, i.e. to random graphs which have restricted degrees and are 
reweighted according to their number of CCs, we concentrate a moment on the case g = 1, i.e. without considering 
the number of CCs. The basic idea is that the number of CCs can be introduced in a later moment considering large 
deviations from typical q — 1-graphs. The approach generalizes the cavity- type calculation of [l^, which is the special 
case K — oo. We will resort to this limit in order to check correctness of our results. 



A. Graphs at g = 1 

First, we describe the graphs without any constraint coming from the number of CCs, i.e. without entropy losses 
in the DNA-looping model. In this case we have s = 0, i.e. q = 1- The graphs have N vertices (or BDs), each of 
them containing K distinguishable BS (called stubs in the following) which allow vertices to have up to degree K. We 
describe graphs at the level of vertices by their symmetric adjacency matrix {Jij} with entries 1 whenever to vertices 
are connected via any two of their BS, and else. The distinguishable nature of the binding sites is taken into account 
by a c ombinatorial factor K\/{K — d)\ for any vertex of degree d. This factor counts the number of non-equivalent 
way the d edges can be attached to the K BS. 

The statistical properties of a graph G with adjacency matrix { Jy} can be characterized by its number of links 



(3) 



and its degree distribution given via the number Nd of vertices of degree d, 



iV,(G) = ^W rf,^J,J , d = 0,...,if, 



(4) 



with the notation (5(-, •) for the Kronecker symbol. Obviously these quantities are not independent. We have Nd — 
N and dNd — 2L since links are counted twice by adding up all degrees. Without reweighting graphs by the number 
of CCs, the graph ensemble is completely characterized by these quantities. In fact we write 
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where the last line already takes into account that degrees beyond K are forbidden. In this notation, 7 acts as a 
chemical potential for links, and corresponds to the binding free energy in the protein case. Note that the combination 
j/{K'^N) is chosen such that, for K — > 00, we recover normal Erdos-Renyi random graphs with average degree 7. In 
this limit, all our results here have to coincide with the ones of [T5| . 

From this microscopic description by the adjacency matrix of a graph, we can go directly to a coarser description 
giving the probability of a graph to have some degree distribution Nd — PdN. We find 
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(6) 



This equation is obtained by multiplying the probability of a single of these graphs by the number of possible 
realizations of the degree sequence. The factors are, in the order of appearance, the number of ways to assign 
degrees to the vertices, the number of possibilities to wire a set of vertices with given degrees, and a correction of 
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assigned d stubs (or half-edges) . These are brought into a random permutation, and the first stub becomes connected 
to the second one, the third to the fourth etc. This procedure overcounts graphs: A factor accounts for possible 
permutation of the two stubs inside each of the L links, the factor 1/Ll for permutations of entire links. This procedure 
does not forbid double links or self-loops, but these are rare and therefore do not influence the global statistical features 
of the graph ensemble. Using Stirlings formula In A^! = N\nN — N + o{N) we can rewrite this as 
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(7) 



where we have used £ — L/N — ^X^d^Pd- The typical degree distribution, which is realized in this ensemble with 
probability tending to one in the thermodynamic limit, can be evaluated by the maximum of this expression. Deriving 
the exponent by pd, and using a Lagrange multiplier to ensure normalization of the degree dsitribution, we arrive 
directly at (overbars always denote typical values) 



Pd 




(8) 



With the average degree determined from this distribution. 
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(9) 



we arrive at two self-consistent equations for z and d which are solved by (a second non-physical solution is not shown) 



1 + 4— - 1 
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(10) 



Note that, for K ^ co^ the degree distribution ((S]) tends as expected to the Poissonian law e~'^^'^ /d\. These results 
allow us to express also the dominant contribution to the partition function Z(^, K, N) by the exponent in Eq. ([T]) 
evaluated at the saddle point: 



In Z(7, K,N) = In 1 hi l H In -! ^ o(l) 

KN ^ ^ I I K 2K \KK 2K ^ ' 



(11) 



B. The case q / 1 

Now we have all results being important for analyzing the full model including entropy losses. In difference to the 
case discussed so far, the full graph ensemble includes a weight q'^f'^' depending on the number C(G) of connected 
components of graph G. We therefore consider the modified ensemble 



P{G\^,q,K,N) 
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(12) 



where the normalizing partition function is given as 



(13) 
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Compared to the previous section, this graph ensemble is hard to handle: The distribution depends on the global 
quantity C(G) which cannot calculated as easily as the degree distribution, which was sufficient to characterize the 
graph ensemble at q = 1. To get information, we modify the cavity-type approach of (isj . There, information on 
the graph ensemble with C-dependent weight (but without degree constraint, i.e. the case (7^1 and K — > oo) was 
obtained via a simple and intuitive idea: If we add a new vertex to a typical graph of size N , and the degree of this 
vertex is randomly selected according to the typical degree distribution of size-iV graphs, the new graph is basically 
equivalent to a typical graph of size TV + 1. Unfortunately, this argument holds only approximately in our case. The 
added vertex does not become a typical one of the enlarged graph. However, since it holds for K ~^ oo and for K = 1, 
so we expect it to be rather precise also for intermediate-large K. 

Assume therefore that we add a vertex to a graph G oi N vertices. The degree d of this vertex is drawn randomly 
from = {^)z''/{l + z)^ with z given by Eq. (fTO|) (graph ensemble dX q — 1). Now, the number of components 
changes according to 

P(C|7, K,N+l) = Y^ D{AC)P{C + AC|7, K, N) . (14) 

AC 

The kernel D{AC) can be decomposed into various contributions according to the degree d of the new vertex, and 
the number do of links this new vertex makes with the giant component of G. The change AC of the number of CCs 
also depends on these two numbers. For positive do > 0, we add d — do small components to the giant one, i.e. we 
have AC = d — do- For do — 0, we unify d small components to a single one, and we have AC = d — 1. The kernel 
therefore results in 

^(AC) = D{AC,d,do) , 

A'<d<do<0 

b{AC,d,do) = aprf^^^*(l-7r)'^-*J(AC,d-do-'5(do,0)) , (15) 

with TT denoting the probability of selecting an end- vertex inside the giant component. Due to the special definition 
of our ensemble, where we do not select directly vertices but free BS associated to a vertex, the number tt equals 
therefore the fraction of all free BS being inside the giant component. It has a simple relation to the fraction v of 
vertices belonging to the giant component: 

^^ K-di^ ^ (16) 
K-d 

Here dm denotes the average degree inside the giant component, d the one of the full graph. 

The reweighting factor a can be calculated exactly in the case ii' — > oo, cf. [lB|- Its precise value is not of interest 
in our discussion. If we multiply Eq. (|14p by q'-^ and sum over C, we obtain 

Z{^,q,K,N + l) = Y,D{AC)q-'^Z{j,q,K,N) 

sc 

= C{l,q,K) Z{j,q,K,N) . (17) 

The logarithm of ^(7, q, K) can be interpreted as a free-energy shift in the graph ensemble due to adding a new vertex. 
Using Eq. (fT5|) it can be calculated right away, resulting in 
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Note that due to the concentration of intensive quantities to their typical values in the summation over all graphs, 
we have replaced tt by its saddle-point value 7f. 

The decomposition of D{AC) helps to get more detailled insight into the graph structure. In fact, the quantity 
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describes an effective single- vertex Boltzmann factor, and single-vertex quantities as the probability of belonging to 
the giant component or the degree distribution can be derived from it. 

To start with the giant component size, we remind that for all do > the newly added vertex becomes connected 
to the giant component, and thus is part of it in the {N + l)-vertex graph. Therefore the fraction of vertices not 
belonging to the giant component can be written as 
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The degree distribution of the graph results in 
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(21) 



For q ^ \, this distribution deviates from a simple binomial distribution. The average vertex degree follows immedi- 
ately. 
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where we have used Eq. (|20p to simplify the expression in the second line. The degree distribution inside (resp. 
outside) the giant component can be obtained by restricting sums to do > (resp. do =0), 



PiMi.q.K) = 

Pout(d|7,9,^) = 



Y.do=i^{d,do\-i,q,K) 
J27=iJ2do=iCid,do\j,q,K) 
ad,do^O\l,q,K) 



E7^o C{d, do =Q\j,q,K) • 
For the outside average degree we obtain the particularily simple result 
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(24) 



For the inside average degree, we use the fact that the total average degree is the weighted average of the inside and 
outside degrees, and we find 



d - dout(l - t^) 



(25) 



C. The phase diagram 

We have now enough equations to determine self-consistently the phase diagram. Putting together Eqs. P6l20l22l24p 
and Ij25p . and eliminating directly the expression for djn, we find a closed set of three equations 
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We had introduced the model in function of the parameter 7 which can be understood as a chemical potential coupled 
to the number of edges in the graph. Since this parameter has no very obvious interpretation due to its interaction 
with the degree-constraints, we prefer to use its conjugate quantity, the link density £ as a control parameter. In this 
sense, for given K, the phase diagram is spanned by q and £, and Eqs. (|26p allow to determine the unknown quantities 
V,W and 7. 

Eqs. (pS)) always have the solution {F, tt, 7} — {0, 0, 2lqK/ {K — 2l)}. It corresponds to a phase without any extensive 
CC, i.e. to a non-agglomerated phase. For large enough £ and q — e^, also other solutions exist. To see this, we 
expand Eqs. (pS)) up to second order in V and 7f, and find in particular 

_ K[2£{K -1)-K] 

^ = — =2 y^'i 

2£ {K-l)[2 + K{q-2)] 

which implies a continuous transition to a non-trivial solution at 

4=^(^, V,<,.^2-1 (28) 

Note that, for K 00 and q = I, this result reproduces the known percolation result in Erdos-Rcnyi random graphs. 
For q < qc, for all K, Eq. ([^T]) implies that tt ^ (£ — £c) near the transition point. At g = Qc, we can expand Eqs. (|26|) 
up to third order in 77 and tt. We find that there is a percolating point at same value of £c as for q < qc, but with 
7f ~ (£ — JcY^^. Note that this transition exists for all iiT > 2, at iiT = 2 itself the transition point would be = 1 
which equals the highest possible degree in this graph (due to 2£ < K) . 

For q > qc, Eq. (j27p does not make sense. We find a discontinuous transition at smaller £i,{q,K) which has to 
be determined from Eqs. (|^ via the spinodal point; the transition point can be obtained with good precision using 
symbolic manipulation software like Mathematica [18[ . In this case, the largest component jumps from a non-extensive 
size to a finite fraction of the full system. In Fig. [31 the phase diagram for various values of K is given. It is found to 
be qualitatively similar for all -ftT > 3, but agglomeration is favored for higher-order BDs at same number of links. In 
the inset of the figure we show also the discontinuous nature of the transition: At given number of links the parameter 
q is changed, and the size of the largest CC is recorded. We find an excellent agreement between MC simulations 
of random graphs using Metropolis type rewiring steps, and the analytical results obtained from Eqs. ()26|) . This 
illustrates the quality of approximation done in the analytical approach under vertex addition. 

It is very interesting that the phase transition appears at smaller £ for higher entropy losses s = Inq. The reason is 
that an increased s leads to a compaction of the giant component, where links can be added without loosing further 
CCs. Therefore, even if the transition appears at lower global average degree, the average degree inside the largest 
CC always exceeds two. Again, this fact illustrates why K > 2 is essential for agglomeration. 

The left panel of Fig.[4]shows the plot of the average degrees inside and outside the giant component for q — 10, and 
for different values of K. For all K > 3, both din and dout show clearly a discontinous jump at a critical if-dependent 
value of £. On one hand, din always jumps to a value slightly above two being necessary for the largest component to 
be connected. The degree outsite the giant component, on the other hand, jumps to values which become smaller and 
smaller with increasing K. In the right panel of Fig. [4l we also show the fraction of vertices resp. edges being in the 
giant component. For higher K values, the fraction of vertices becomes smaller at the transition point, whereas the 
fraction of links becomes larger. This illustrates that, for larger K, the agglomerate at the threshold becomes smaller 
but more dense. 

Similarly, it is illustrative to look at the properties of the giant component for various values of of q. Fig. [5] shows 
the plots ior K — 5 at q = 5, 10 and 20. We see that, as q increases, the fraction of vertices inside the giant component 
goes down (right panel), but the fraction of edges goes up, implying that the agglomerate becomes more and more 
compact with increasing q. Also the difference between the indegree and outdegree goes up as q increases (left panel). 



V. DISCUSSION 
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FIG. 3: Phase diagram for protein agglomeration in mean-field description for if = 3, 5, 10, 100: Below the line, no extensive 
CC exists. Above, a finite fraction of all links and vertices is collected in the largest CC. The transition is continuous on the 
left, discontinuous on the right side of the diamonds. Inset: Fraction of vertices collected in the largest component as function 
of q = e", for (. = 0.2 and K = 20. The full line is the analytical result of Eqs. (|26|l . the symbols show results of MC simulations 
for A*' = 5000, each symbol is an average over 900 independent equilibrium configurations. 



the mechanism behind the formation of transcription factories, which are observed in transcriptionally active cells. In 
our paper we show that two ingredients are sufRcient: DNA-looping proteins which are able to bind simultaneously to 
two - also distant - protein binding sites on the DNA, and binding domains on the DNA which contain, on average, 
more than two binding sites each. In this case, the competition between free-energy gain by protein binding and 
entropy loss by DNA looping is found to lead to an effective attraction between DNA loops. As a consequence, 
binding domains and proteins agglomerate collectively. 

In its minimal character, the model might miss some important properties of the biological system. As an example, 
we consider the number of doubly-bound proteins as one important control parameter, whereas the relevant parameter 
should be the total number of proteins - which would also include free and singly bound molecules. Using biologically 
reasonable parameters for the binding affinities (ca. 5-15 kcal/mol), we find in simulations that basically none of the 
proteins stay free and a large majority is doubly bound in the phase-transition region. It would be interesting if we 
could extend our analytical model accordingly. 

Further more, our model did not consider the specificity of interaction between DNA-binding proteins and their 
binding sites. This specificty may result in the simultaneous agglomeration of various specific transcription factories, 
which actually is an important ingredient to @, 0|- Including this possibility into our model would be another 
interesting generalization, but it would not effect the very basic agglomeration mechanism. 

Our model of agglomeration is similar to models of randomly linked polymers studied in the literature. Previous 
studies mostly considered a Gaussian chain with randomly placed links using variational flQ*! and numerical methods 
(20i] . Bryngelson et al 19] in their study based on variational approach and scaling arguments tried to argue that 
for a Gaussian chain when the links are soft, there is always a transition. They also argued that it is a continuous 
transition that occurs above a threshold which is a product of the density of links and logarithm of average length 
of the loop. This result implied that for some arbitrary polymer, it is possible for transition to occur at vanishing 
density of finks (~ 1/lnA^). This result was countered by Kantor et al Based on scaling arguments they argued 
that number of links M necessary for a percolating collapsed phase to exist scale as N"^ , with </) = 1 — \/dv^ where v 
is the exponent which describes the shape of the polymer 25] (radius of gyration Rg ^ iV). Also, for the probability 
of looping p{l) ^ A/l", with 1 < a < 2, it was shown by Schulman and Newman [2l| that for M < N/2 no infinite 
percolating cluster exists. For M > N/2, percolation may or may not occur depending on the value of A and a. 

Our solution correspond to the case where we have ignored length dependence of the entropy cost of the loop {a — 0) 
and we find that there would always be a transition from extended phase to collapsed phase, though the nature of 
transition denends on the entronv cost f.s = —InA) of the loon. Most surnrisinslv. lareer the cost of looninof. smaller 
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d-d=2{l-l) d-d =2(1-1) 

FIG. 4: Left panel: Average degrees din inside (upper curves) and dout outside the giant component (lower curves), as a function 
of the total link density I, for values if = 3, 5, 10 and fixed q — 10. Right panel: Fraction 77 of vertices (lower curves) and 
fraction of links (upper curve) for the same parameters as in the left panel. 

but a first order transition for low concentration of links. 

We did simulations for a Gaussian chain in three dimensions: In our mean-field model we considered entropy losses 
to be independent of distance between the BDs measured along the DNA. The distance dependence of entropy in vivo 
is complicated. If we assume that the unlinked DNA behaves on long scales like a Gaussian chain, the entropy loss is 
monotonously increasing in the the loop length and scales as q{l) — e**^'-' (l/lo)^^'^, where Iq is the minimum distance 
between two ends of a loop. If we now look to a connected component of the n vertices ...in} with im < im+i for 
all 1 < m < n, the entropy loss is given by s(«i, ...in) = Y^m=i ^(im+i — im) ■ In our simulations, we find that there 
is still a discontinuous transition which depends on the choice of (see Fig. [6]). Since longer loops are suppressed 
compared to shorter ones, one could expect CC to be more localized in one-dimensional distance along the DNA. This 
would correspond to Cook's picture where DNA loops around a factory form a kind of rosetta, before DNA goes to 
the next factory. The logarithmic entropy dependence taken into account in our simulations is not sufficient for such 
a localization. 

Based on these simulations, we suggest that even when we take length dependence into account, there is a possibility 
of first order transition to the agglomerate, at small density of links for high entropy cost. The reason this could 
be possible would be because of the larger (exponential) contribution of the distribution of links to the entropy in 
comparision to the lograthimic dependence of entropy on length. 

We have ignored the interaction between loops. It is not clear how important that could be. For example, in the 
case of DNA denaturation [2^ , exact results which ignore interaction between loops predict a continuous transition in 
all dimensions less than four. Whereas using scaling arguments and taking interaction of loop with rest of the chain 
into account, Kafri et al [1^ showed that the transition becomes first order in d = 2. 

The present work can be extended into various directions. First, from biological point of view it would be interesting 
to go to more realistic modeling schemes (like worm-like chains for the DNA molecule) and to check the proposed 
picture. Such a simulation would also allow to introduce biologically realistic parameters for protein binding affinities 
and entropy losses, and to locate such a realistic setting in the simplified mean-field phase diagram. However, current 
simulations are concentrated to a sing le loop so this task seems to pose a considerable numerical challenge. It 

would be interesting to see if self exclusion, depletion effects due to macromolecular crowding or restricted volume 
would lead to a spatial localization of the agglomerate. A second direction could be the inclusion of diverse looping 
proteins with specific binding sites on the DNA to see whether equilibrium thermodynamics can drive the creation of 

J £ J j." l r : TTi lT ii. u T ■ j j.i j._ 1 ;j.T- _r 
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FIG. 5: Left panel; Average degrees din inside (upper curves) and dout outside the giant component (lower curves), as a function 
of the shifted link density £ — £c, for fixed K = 5 and diverse values of g = 5, 10, 20. Right panel: Fraction F of vertices (lower 
curves) and fraction of links (upper curve) for the same parameters as in the left panel. 



phase. 
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